Technical Report: Output Privacy Protection in Stream Mining

نویسندگان

  • Ting Wang
  • Ling Liu
چکیده

Privacy preservation in data mining demands protecting both input and output privacy. The former refers to sanitizing the raw data itself before performing mining. The latter refers to preventing the mining output (model/pattern) from malicious pattern-based inference attacks. The preservation of input privacy does not necessarily lead to that of output privacy. This work studies the problem of protecting output privacy in the context of frequent pattern mining over data streams. After exposing the privacy breaches existing in current stream mining systems, we propose Butterfly, a light-weighted countermeasure that can effectively eliminate these breaches without explicitly detecting them, meanwhile minimizing the loss of the output accuracy. We further optimize the basic scheme by taking account of two types of semantic constraints, aiming at maximally preserving utilityrelated semantics while maintaining the hard privacy and accuracy guarantee. We conduct extensive experiments over real-life datasets to show the effectiveness and efficiency of our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protecting Output Privacy in Stream Mining

Privacy preservation in data mining demands protecting both input and output privacy. The former refers to sanitizing the raw data itself before performing mining. The latter refers to preventing the mining output (model/pattern) from malicious pattern-based inference attacks. The preservation of input privacy does not necessarily lead to that of output privacy. This work studies the problem of...

متن کامل

Privacy-preserving Clustering of Data Streams

As most previous studies on privacy-preserving data mining placed specific importance on the security of massive amounts of data from a static database, consequently data undergoing privacy-preservation often leads to a decline in the accuracy of mining results. Furthermore, following by the rapid advancement of Internet and telecommunication technology, subsequently data types have transformed...

متن کامل

Output Privacy Protection With Pattern-Based Heuristic Algorithm

Privacy Preserving Data Mining(PPDM) is an ongoing research area aimed at bridging the gap between the collaborative data mining and data confidentiality There are many different approaches which have been adopted for PPDM, of them the rule hiding approach is used in this article. This approach ensures output privacy that prevent the mined patterns(itemsets) from malicious inference problems. A...

متن کامل

A Heuristic Approach to Preserve Privacy in Stream Data with Classification

Data stream Mining is new era in data mining field. Numerous algorithms are used to extract knowledge and classify stream data. Data stream mining gives birth to a problem threat of data privacy. Traditional algorithms are not appropriate for stream data due to large scale. To build classification model for large scale also required some time constraints which is not fulfilled by traditional al...

متن کامل

Analysis of Email Fraud detection using WEKA Tool

—Data mining is also being useful to give solutions for invasion finding and auditing. While data mining has several applications in protection, there are also serious privacy fears. Because of email mining, even inexperienced users can connect data and make responsive associations. Therefore we must to implement the privacy of persons while working on practical data mining. Using K-mean cluste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007